Compressed String Dictionary Look-Up with Edit Distance One

نویسندگان

  • Djamal Belazzougui
  • Rossano Venturini
چکیده

In this paper we present different solutions for the problem of indexing a dictionary of strings in compressed space. Given a pattern P , the index has to report all the strings in the dictionary having edit distance at most one with P . Our first solution is able to solve queries in (almost optimal) O(|P |+ occ) time where occ is the number of strings in the dictionary having edit distance at most one with P . The space complexity of this solution is bounded in terms of the k-th order entropy of the indexed dictionary. Our second solution further improves this space complexity at the cost of increasing the query time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dictionary Look-Up within Small Edit Distance

Let W be a dictionary consisting of n binary strings of length m each, represented as a trie. The usual d-query asks if there exists a string in W within Hamming distance d of a given binary query string q. We present an algorithm to determine if there is a member in W within edit distance d of a given query string q of length m. The method takes time O(dm d+1) in the RAM model, independent of ...

متن کامل

Efficient approximate dictionary look-up over small alphabets

Given a dictionary W consisting of n binary strings of length m each, a d-query asks if there exists a string in W within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 [10] as a challenge to data structure design. Efficient solutions have been developed only for the special case when d = 1 (the 1-query problem). We assume the standard RA...

متن کامل

Approximate string matching algorithms for limited-vocabulary OCR output correction

Five methods for matching words mistranslated by optical character recognition to their most likely match in a reference dictionary were tested on data from the archives of the National Library of Medicine. The methods, including an adaptation of the cross correlation algorithm, the generic edit distance algorithm, the edit distance algorithm with a probabilistic substitution matrix, Bayesian a...

متن کامل

A fast algorithm for finding the nearest neighbor of a word in a dictionary

In this paper a new algorithm for string edit distance computation is proposed. It is based on the classical approach [11]. However, while in [11] the two strings to be compared may be given online, our algorithm assumes that one of the two strings to be compared is a dictionary entry that is known a priori. This dictionary word is converted, in an o -line phase to be carried out beforehand, in...

متن کامل

Algorithme de recherche approximative dans un dictionnaire fondé sur une distance d'édition définie par blocs

We propose an algorithm for approximative dictionary lookup, where altered strings are matched against reference forms. The algorithm makes use of a divergence function between strings— broadly belonging to the family of edit distances; it finds dictionary entries whose distance to the search string is below a certain threshold. The divergence function is not the classical edit distance (DL dis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012